xiii

Preface

T

he use of next-generation sequencing data analysis is the only analysis that can

make sense of the massive genomic data produced by the high-throughput sequenc-

ing technologies and accumulated in gigabytes and terabytes in our hard drives and cloud

databases. With the presence of computational resources and elegant algorithms for NGS

data analysis, scientists need to know how to master the tools of these analyses to achieve

the goals of their research. Learning NGS data analysis techniques has already become one

of the most important assets that bioinformaticians and biologists must acquire to keep

abreast of the progress in the modern biology and to avail of the genomic technologies and

resources that have become the de facto in bioscience research and applications including

diagnosis, drug and vaccine discovery, medical studies, and the investigations of pathways

that give clues to many biological activities and pathogenicity of diseases.

In the last two decades, the progress of next-generation sequencing has made a strong

positive impact on human life and a forward stride in human civilization. Introduction of

new sequencing technologies revolutionizes the bioscience. As a result, a new field of biol-

ogy called genomics has emerged. Genomics focuses on the composition, structure, func-

tional units, evolution, and manipulation of genomes, and it generates massive amount

of data that need to be ingested and analyzed. As a consequence, bioinformatics has also

emerged as an interdisciplinary field of science to address the specific needs in data acqui-

sition, storage, processing, analysis, and integration of that data into a broad pool to enrich

the genomic research.

This book is designed primarily to be a companion for the researchers and graduate

students who use sequencing data analysis in their research, and it also serves as a text-

book for teachers and students in biology and bioscience. It contains an updated material

in the subject covering most NGS applications and meeting the requirements of a complete

semester course. The reader will find that this book is digging deep in the analysis, pro-

viding both concept and practice to satisfy the exact need of the researchers who seek to

understand and use NGS data reprocessing, genome assembly, variant discovery, gene pro-

filing, epigenetics, and metagenomics. The book does not introduce the analysis pipelines

in a black box as the existing books do, but with the analysis steps, it pervades each topic in

detail to provide the readers with the scientific and technical background that enable them

to conduct the analysis with confidence and understanding.

The book consists of eight chapters. All chapters include real-world worked examples

that demonstrate the steps of the analysis workflow with real data downloadable from the